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Abstract 

We study the applicability of the Peaceman-Rachford (PR) splitting method for solving 
nonconvex optimization problems. When applied to minimizing the sum of a strongly convex 
Lipschitz differentiable function and a proper closed function, we show that if the strongly 
convex function has a large enough strong convexity modulus and the step-size parameter is 
chosen below a threshold that is computable, then any cluster point of the sequence generated, 
if exists, will give a stationary point of the optimization problem. We also give sufficient 
conditions guaranteeing boundedness of the sequence generated. We then discuss one way to 
split the objective so that the proposed method can be suitably applied to solving optimization 
problems with a coercive objective that is the sum of a (not necessarily strongly) convex 
Lipschitz differentiable function and a proper closed function; this setting covers a large class 
of nonconvex feasibility problems and constrained least squares problems. Finally, we illustrate 
the proposed algorithm numerically. 


1 Introduction 

Consider the following optimization problem with competing structure: 

min f{u) + g{u), (1) 

U 

where / and g are proper closed possibly nonconvex functions. Optimization problems of this 
form arise in many important modern applications such as signal processing, machine learning and 
statistics [iiiniiiiiisa. A typical application of © is to solve some ill-posed inverse problems 
where the function / represents the data fitting term and the function g is the regularization 
term. To solve problems with competing structures, an important and powerful class of algorithms 
is the class of splitting methods. In these methods, the objective function is decomposed into 
simpler individuals which are then processed separately in the subproblems. Two classical splitting 
methods in the literature are the Douglas-Rachford (DR) splitting method [T3[T^[2B] and the 
Peaceman-Rachford (PR) splitting method [2S1I30) . 

The PR splitting method was originally introduced in |30) for solving linear heat flow equations, 
and was later generalized to deal with nonlinear equations in [26]. In the case when / and g are 
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both convex, the PR splitting method can be described conveniently by the following update: 

= (2prox^g - I) o (2prox^j - I)ix*), (2) 

where I is the identity mapping, 7 > 0 and 

Prox^/^W := Argmin|7/i('u) + | , 

i.e., the set of minimizers of the problem min 7 / 1 ( 14 ) + i||M — z||^; we note that this set is a singleton 
when h is convex. Although the PR splitting method can be faster than the DR splitting method 
(see, for example, m and Example [T] in Appendix), the PR splitting method was not as popular 
as the DR splitting method. This is also witnessed by the fact that the PR splitting method is not 
discussed nor mentioned in the recent monograph [5] on operator splitting methods. One of the 
main reasons for the unpopularity is that, even in the convex settings, the PR splitting method 
is not convergent in general. To guarantee convergence, typically one would require either the 
operator (2prox^j — I) or (2prox.yg — I) to be a contraction mapping. In applications where /, 
g are both convex, this requirement typically needs / or g to be strongly convex, which largely 
limits the applicability of the PR splitting method; see, for example, [T2l|26]. In contrast, under 
a commonly used constraint qualification which can be easily satisfied, the DR splitting method 
converges in the convex case m Theorem 20]. Moreover, recently, it has been shown in that 
the DR splitting method can be adapted to a nonconvex setting with global convergence guaranteed 
under some assumptions. This broadens the applicability of the DR splitting method to cover many 
nonconvex feasibility problems and many important nonconvex optimization problems arising in 
statistical machine learning such as the ii /2 regularized least squares problem. 

In this paper, to broaden the applicability of the PR splitting method, we extend it to a 
nonconvex setting. By constructing a merit function which captures the progress of the PR splitting 
method, we extend the global convergence of the PR splitting method from the known convex 
setting to the case where the objective function can be decomposed as the sum of a strongly 
convex Lipschitz differentiable function and a nonconvex function, under suitable assumptions. 
As a by-product, this extension also allows us to establish the global convergence and iteration 
complexity of a new PR splitting method for convex optimization problems in the absence of strong 
convexity. The underlying intuitive idea is that one can decompose a non-strongly convex function 
F G into the sum of a strongly convex function / = F -I- 7 II ■ |p and a nonconvex function 
g = G — "f\\- IP, ifa 7 > 0 can be chosen so that / is strongly convex. 

The contributions of this paper are two-fold. First, we establish that, for the sequence gen¬ 
erated by the PR splitting method applied to minimizing the sum of a strongly convex Lipschitz 
differentiable function and a proper closed function, if the strongly convex function has a suffi¬ 
ciently large strong convexity modulus and the step-size parameter is chosen below a threshold 
that is computable, then any cluster point, if exists, gives a stationary point of the optimization 
problem. We also provide sufficient conditions to guarantee boundedness of the sequence gener¬ 
ated. To our knowledge, this is the first work that studies the convergence of the PR splitting 
method for nonconvex optimization problems. Second, we demonstrate how the method can be 
suitably applied to minimizing a coercive function F -|- G, where G is a proper closed function, and 
F is convex Lipschitz differentiable but not necessarily strongly convex. Even in the case when 
G is also convex, it was previously unknown in the literature how the PR splitting method can 
be suitably applied to solving it. Our study largely broadens the applicability of the PR splitting 
method. We also discuss global iteration complexity of this new PR splitting method under the 
additional assumption that G is convex, and establish global linear convergence of the sequence 
generated if F -|- G is further assumed to be strongly convex. 
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The rest of the paper is organized as follows. In Section 11.11 we fix the notation and recall 
some basic definitions which will be used throughout this paper. In Section [2l we establish the 
convergence of the PR splitting method for nonconvex optimization problems where the objective 
function can be decomposed as the sum of a strongly convex function and a proper closed function, 
under suitable assumptions. In Section [3l we demonstrate how the PR splitting method can be 
applied in the absence of strong convexity. In Section 21 as applications, we illustrate how the 
PR splitting method can be applied to solving two important classes of nonconvex optimization 
problems that arise in the area of statistics and machine learning: constrained least squares problem 
and feasibility problems. We also demonstrate our approach numerically. Our concluding remarks 
are in Section [S] Finally, in the Appendix, we provide simple and concrete examples illustrating 
the different behaviors of the classical PR splitting method, the classical DR splitting method and 
our proposed PR splitting method. 


1.1 Notation 


In this paper, the n-dimensional Euclidean space is denoted by IR", with the associated inner 
product denoted by (•,•) and the induced norm denoted by || • ||. For an extended-real-valued 
function / : IR" —>■ (—oo, oo], we say that / is proper if it is never —oo and its domain, dom/ := {x € 
IR" : f{x) < -too}, is nonempty. Such a function is said to be closed if it is lower semicontinuous. 
For a proper function /, we lei z ^ x denote f{z) —>■ /(x) and z ^ x. The limiting subdifferential 
of / at X G dom / is defined by m 


df{x) := <^ u G IR” : 3x 


* X, 


. , . f(z)-/(x‘)-(u*,z-x‘) „ , 

V with Inn mf --G- — - > 0 for each t 


From the above definition, one immediately obtains the following robustness property: 



G IR" : 3x‘ A- x, x* —>■ x G 9/(x‘)| C df{x). (4) 

The subdifferential ([3]) reduces to the derivative of / (denoted by V/) if / is continuously differen¬ 
tiable, and the classical subdifferential in convex analysis if / is convex (see, for example, [ST] Propo¬ 
sition 8.12]). For a function / having more than one group of variables, we let dxf (resp., Va,/) 
denote the subdifferential (resp., derivative) of / with respect to the variable x. 

We say that a function / is a strongly convex function with modulus cr > 0 if / — ^-H • |p is a 
convex function. A function / is said to be coercive if liminf /(x) = oo. For a nonempty closed 

||x||->.oo 

set S C IR", its indicator function Ss is defined by 

Jo if X G S', 

SS[X) = < .a. . o 

I -foo if X ^ S. 


We use the notation ds{x) or dist(x, S) to denote the distance from an x G IR" to S, i.e., ds{x) := 
infygs ||x — y\\. Moreover, we use Psix) to denote the points in S that are closest to x: note that 
Psix) is a singleton set if S is, in addition, convex. 

Finally, for an optimization problem min /(x), we use Arg min /(x) to denote the set consisting 

X 

of all its minimizers. If Arg min /(x) turns out to be a singleton, we simply denote it as arg min /(x). 

X X 
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2 Peaceman-Rachford splitting for structured nonconvex prob¬ 
lems 

Recall that the class of problems we consider is 


min f{u)+g{u), 

U 


(5) 


where / and g are proper closed possibly nonconvex functions. As discussed in the introduction, 
even in the case when both / and g are convex, typically one would need / (or g) to be strongly 
convex to guarantee convergence of the PR splitting method. Moreover, we recall that the Lipschitz 
differentiability of / played an important role in the recent convergence analysis of the closely 
related DR splitting method in for ([S]) in the nonconvex settings. Motivated by these, we 
make the following blanket assumption on / throughout this paper. 

Assumption 1 (Blanket assumption on /). The function f is strongly convex with a strong 
convexity modulus at least cr > 0, and is Lipschitz differentiable so that V/ has a Lipschitz conti¬ 
nuity modulus at most L > 0. 

Notice that the proximal mapping prox.^j(z) of a strongly convex function / is well defined for 
any 7 > 0 at any point z. Thus, in order for the iterates in (I2|) to be well defined, we only need to 
make additionally the following blanket assumption on g in this paper. 

Assumption 2 (Blanket assumption on g). The function g is proper closed with a nonempty 
proximal mapping prox^g( 2 ;) for any z and for the "f > 0 we use in the algorithm. 

Under the blanket assumptions, we consider the following adaptation of the PR splitting method 
to solve the possibly nonconvex problem ([5]), which can be easily shown to be equivalent to © in 
the case when / and g are convex (so that the proximal mappings are single-valued). 


PR splitting method 
Step 0. Input and 7 > 0. 
Step 1. Set 



( 6 ) 


Step 2. If a termination criterion is not met, go to Step I. 


Our convergence analysis follows a similar line of arguments (with some intricate modifications) 
for showing convergence for the Douglas-Rachford splitting method as in our recent work [25] , and 
has to make extensive use of the following merit function: 



( 7 ) 
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where 2)-y is the so-called Douglas-Rachford merit function given by ^j{y,z,x) = f{y) + g{z) — 
^\\y — -I- ^{x — y,z — y) (see [25l Definition 2.1]), motivated by [29l Eq. 35]. 

Before proceeding, we make two important observations. First, it is not hard to see that the 
merit function can alternatively be written as 




f{y) + aiz) + -^\\‘2y -z- x\\^ -^\\x- yf --\\y- 
27 27 7 

f{y) + 9{z) + ^(Ijx - yf - lla: - zf - 2\\y - zf), 


( 8 ) 


where the first relation follows from the elementary relation (m,u) = ^(||u + u|P — Ijulj^ — ||t'||^) 
applied with u = x — y and u = z — y in 0 , while the second relation is obtained by using the 
elementary relation (u, u) = ^(Ijulj^ -b |]u|]^ — Iju — u|]^) in d?]) with u = x — y and v = z — y. We will 
make use of these equivalent formulations in the convergence analysis. Second, we note by using 
the optimality conditions for the y and z-updates in (jB]) that: 


0 = V/(y*+i) + i(y‘+i-:r‘), 

7 

0 € %(z‘+i) + i(z‘+i - y‘+i) - - x‘), 

7 7 


(9) 


where we made use of the subdifferential calculus rule m Exercise 8 . 8 ]. Consequently, for all 
t > 1, 

Oe V/(y*)+9y(z‘) + -(z*-y‘). (10) 

7 

To establish convergence and characterize the cluster point of the sequence generated, we will 
subsequently show that limt_>oo Ik* — 2/* II = 0 and that g is “continuous” at the cluster point along 
the sequence generated. 

We are now ready to state and prove a convergence result for the PR splitting method (jH]). 
We would like to point out that our proof is following exactly the same line of arguments as 
[251 Theorem 1]. However, there are two crucial differences. First, we now make use of the 
merit function ([7|) in place of the Douglas-Rachford merit function. Second, as we will see in 
the upper estimate in (| 20 |l . the factor of 7 in the denominator is canceled, and thus the strong 
convexity modulus cr comes into play in establishing the non-increasing property of the sequence 
{^^{y\z\x*)}t>i. 


Theorem 1 (Global subsequential convergence). Suppose that 3cr > 2L and the parameter 'y 
is chosen so that 


0 < 7 < 


3cr- 2L 


( 11 ) 


Then the sequence {fp.^(y‘, z*, a:*)}t>i is nonincreasing. Moreover, if a cluster point {y*,z*,x*) of 
the sequence exists, then we have 


lim ||x‘+i - a;*|| = 2 lim \\z*+^ - y*+^\\ = 0 , ( 12 ) 

t—^co t—^oo 

the cluster point satisfies z* = y*, and 


0eVf{z*)+dg{z*). 


Remark 1. We note that the condition 3(t > 2L indicates that this convergenee result ean only 
be applied when f has a relatively large strong convexity modulus, i.e., when a > |L. It seems 
restrictive at first glance, but we will demonstrate in the next section how this theorem can be 
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applied in a wide range of problems that do not explicitly contain a strongly convex part in the 
objective. Specifieally, we will show that the method can be suitably applied to minimizing a coercive 
function F + G, where G is a proper closed function and F is convex Lipschitz differentiable but 
not necessarily strongly convex; see Corollary 

Proof. We study the behavior of along the sequence generated from the PR splitting method. 
First, using 0 and the definition of the x-update, we see that 

7 ^7 

Second, making use of the first relation in ([5]) and the dehnition of z*^^ as a minimizer, we have 

= + ^11 - xY - - 11 ?/*+' - ^*+"'" 

27 


7 

- g{ff) - ^||2y‘+i - 

27 7 

< 1 (||j/*+l - 2 ‘I |2 _ ||„t+l _ ..t+l|| 2 ^ _ 


(14) 


7 


- 11?/*+' - ^ (ll2/‘+' - zT - Jlk*+' - xT) 


where the last relation is due to the definition of Consequently, summing m and m , we 

have 

(15) 
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Next, making use of the second relation in ([ 8 ]), we see that 


= /(?/*+') + - 2/‘+'ir - /(?/*) - - 2/‘f - -11?/*+' - + -|l2/‘ - ^*f 


27 ' 


27' 


7 


7 


(16) 


1 fl 


1 , 


1 , 


<-^[^-+a] - yY - ^ II + tW - 


7 7 

where, in the last inequality, we used the definition of as a minimizer and the strong convexity 
of the objective in the minimization problem that defines the y-update. Combining (1161) with (1151) 
gives further that 

.1 ("i 
2 U 


\:-+a]y+^-y%'^+-y+^-fff+-y-zX- (17) 


7 


7 


To further upper estimate CZl), observe from the first relation in ([H]) that 


Vf{y^y = -{x^-V^y. 

7 

Since / is strongly convex with modulus cr > 0 by assumption, we see that for all t > 1, 

i(x‘ - y‘+i) -- y*),2/*+i - yA > a||y‘+i - y^ 

7 7 / 

^ (x* - - ff) > (1 + 7^)11?/*+' - 2/*f • 

Thus, making use of the definition of x* and the above relation, we obtain further that 


||y‘+i-z*f = ||y*+i-y* + y*-. 


y‘+'-?/‘- 2 (:^*-^‘“') 


= 11?/*+' - - (?/*+' - y\x^ - + tII^* - ^ 


(18) 


<-7^||y* + l_y *||2 + _||x*_x‘ 
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In addition, observe also from the definition of the a;-update, the first relation in m and the 
Lipschitz continuity of V/ that for t > 1 

2||y‘ - z‘|| = ||x‘ - < (1 + - y\ (19) 


Combining (fTSl) . (fTOl) with (IT71) . we conclude that for any t > 1 

i,i+l o’i+l 


x‘+^) - x‘) < ^ ((1 + - 37^ - l) ll2/‘+^ - 2/* 


= \ (-3ct + 2L + yL^) ||?/*+i - ?/‘f. 


( 20 ) 


By our choice of 7 , —3cr + 2L + < 0. From this we see immediately that {fP-y(?/*, z*, x*)} is 

nonincreasing. Summing o from t = ltoA^—l>l,we obtain that 


N-l 


^y{y^, , x^) - , x^) < i (-3cr + 2L + ^ \\yt+^ -ytf. 


( 21 ) 


t=i 


Using this, the closedness of and the existence of cluster points, we conclude immediately from 
(I 2 T]) that lim 2 /*|| = 0 . Combining this with (IT9l) . we conclude that (fT^ holds. Furthermore, 

t—^OO 

combining these with the third relation in (( 61 ), we obtain further that lim — z*|| = 0 . 

t—¥CO 

Consequently, if {y*,z*,x*) is a cluster point of {(t/‘, z*, x*)} with a convergent subsequence 
{(yb ^ z^j ^ 2 ;tj)} sucJi that lim (yb ^ 2 ;b ^ a;b) = (y*, z*, x*), then we must have 

j^oo 


lim (yb ^z*\ xb ) = lim (yb \ zb 1 ^ 3 ;^ ^) = (y*, z*, x*). 

j—^oo j—^OO 


( 22 ) 


Since z* is a minimizer of the subproblem. 


5 (z‘) + ^|| 2 y‘ - z‘ - x‘-if < y(z*) + ^|| 2 y‘ - z* - x^'^f. 


Taking limit along the convergent subsequence and using (|22|) yields 


limsupy(zb) < y(z*). 

J-S-OO 

Conversely, we have lim inf y(zb) > y(z*) by the lower semicontinuity of g. Thus, 

j->oo 

lim y(zb) = y(z*). (23) 

Using (|31), (IT^ . (1^ and passing to the limit in (fTUl) along the convergent subsequence above, we 
conclude that the cluster point gives a stationary point of ([5]), i.e., y* = z* and 


0eyf{z*)+dg{z*). 


This completes the proof. □ 

In the next theorem, we study sufficient conditions to guarantee boundedness of the sequence 
generated from the PR splitting method. Thus, a cluster point will necessarily exist under these 
conditions. 

Theorem 2 (Boundedness of sequence). Suppose that Scr > 2L and the y is chosen to satisfy 
(HU. Suppose in addition that f + g is coercive, i.e., hminf||„||_j,oo(/ + g){u) = 00 . Then the 
sequence {(y‘,z*,x*)} generated from (jH]) is bounded. 
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Proof. Recall from Theorem[T]that the merit function is nonincreasing along the sequence generated 
from ([6l). In particular, 

(24) 

whenever t > 1 , where 


z*, X*) = /(2/‘) + g(z‘) - -||x‘ - z‘f + -||x‘ - y‘f - -||y‘ - 


27 


27 ' 


(25) 


from the second relation in (jS]). Next, recall from the definition of x-update that x* = x* ^ +2(z* — 
y*), which together with the first relation in ([9]) gives 

V/(y‘) = i(x‘-i - 2 /‘) = -([x* - z‘] - [z* - y*]). (26) 

7 7 


Moreover, for the function / whose gradient is Lipschitz continuous with modulus L, we have 


f(z*) < f{/) + (V/(y‘),z‘ - y‘) + ^||z* - . 


(27) 


Combining these with (1^ and (1^ . we see further that 


^i,{y\z\x^) > /(y‘) +fl(z‘) - ^\\x^ - zT + - vT - kv* - 

27 27 7 

> /(z*) + g(z*) - (V/(y‘), z* - y*) - ^||x‘ - z‘f + ^||x‘ - y‘f - 

= /(z‘) + y(z‘) - i(x‘ - z*, z* - y‘) - ^\\x^ - z‘f + ^\\x^ - y‘f - 
7 27 27 

= /(z‘)+y(z‘) + ^i-L) ||y*-z*f, 


t\\2 


^‘11 



\y 

z‘f 


t\\2 


z‘|| 


(28) 


where the second inequality follows from (I27L the first equality follows from while the last 
equality follows from the elementary relation (u,x) = i(||it+x||^—||it|p—||?;|p) applied to u = x* —z* 
and V = z* — y^. From (1^ , the coerciveness of / + y and the fact that 7 < < 7 conclude 

that {z*} and {y*} are bounded. The boundedness of {x*} now follows from these and the first 
relation in ([9]). This completes the proof. □ 


Remark 2 (Comments on the proof of Theorem [^l • (i) The technique of using (|27)l for 

establishing (j28ll was also used previously in \2(A Lemma 3.3] for showing that the augmented 
Lagrangian function is bounded below along the sequence generated from the alternating di¬ 
rection method of multipliers for a special class of problems. Here, we applied the technique 
to the new merit function fP-y. 


(ii) The same technique used here can be applied to establishing the boundedness of the sequence 
generated by the DR splitting method studied in \25\/ under a condition which is slightly weaker 
than the one used in 125] . In fact, one can show that, the DR splitting method in \25] generates 
a bounded sequence under the blanket assumptions of f and g in i25[ Section 3], the condition 
that f g is coercive and the choice of parameter specified in J251 Theorem 4M- 

To see this, recall that for the DR splitting method, we also have ^f{y^) = -(x*“^ — y*) but 
have X* = x*“^ + (z‘ — y*) instead of the third relation in (l6|). Thus, Vf{y*) = ^{x^ — z*) 

^This slightly improves |25l Theorem 4] because |25l Theorem 4] assumed a slightly stronger condition that / 
and g are bounded below and one of them is coercive. 




and we have the following estimate for the DR merit function, making use of 




27 


W r + 


27 


\x -y 


> /(z*) + g{z^) - {Wf{y^), z* - y‘) - -^||z* - y‘f - ^\\x^ - z‘f + ^\\x^ - 
= f{z^) + g{z^) - i(x* - z‘,z‘ - 2/‘) - ^||z‘ - yT - - 2/‘f 


= /(z*)+5(^*) + 


1 / I 


2 V7 


-L]\\y*-z^r, 


where the last equality follows from the elementary relation {u,v) = ^(Hu + iilp —||u|p—||?;||^) 
applied to u = x* — z* and v = z* — y*. The boundedness of the sequence can then be deduced 
under the choice of 'y in f25\. Theorem 4], which guarantees 7 < ■^, and the assumption that 
f + g is coercive. 

As in [231 Theorem 4] and [251 Theorem 2], one can also show that the whole sequence generated 
is convergent under the additional assumption that z, x) is a KL function^ To this end, note 

that for any t> 1 , we have from (O and the third relation in ([51) that 

V.^,{y\ z\ X*) = -(z‘ - y^) = ^{x^ - x*"'). (29) 

7 27 

Moreover, using the second relation in ([81), one can obtain 

V,^^(y‘,z‘,x‘) = V/(y‘) + - x‘) - -(y* - z*) = ^(x*-' - x‘) - -(y‘ - z‘) = 0 (30) 


where the second equality follows from the first relation in and the last equality follows again 
from the third relation in ([61). Finally, using the second relation in ([ 8 l), one can compute that 

d.^^{y\z\x^) = dg{z*) - i(z‘ - x‘) - -(z‘ - y‘) 

7 7 

= dg{z^) + i(z‘ - y*) - i(y‘ - x^'^) - i(z‘ - y‘) + i(y‘ - x‘-i) - i(z‘ - x*) - -(z* - y*) 

'y “y ^ ^ T *T 

9 --(z‘ - y*) + -(x‘ - x‘-i) = --(x‘ - x‘-i), 

7 7 7 

(31) 

where the inclusion follows from the second relation in ([ 2 ]) and the last equality follows from the 
third relation in ([ 6 ]) . Consequently, by combining (l29l) , (1301) , (l3T]) and ( 0 , we see the existence of 
K > 0 so that 

dist (0,aq3^(y‘,z*,x‘)) < K||y‘+i - y‘||. 

Using this, (1201) and following the arguments as in the proof of [251 Theorem 2], it is not hard to 
prove the following result. We omit the detailed proof here. 

Theorem 3 (Global convergence of the whole sequence). Suppose that Scr > 2L, the param¬ 
eter 7 > 0 IS chosen as in inn and that the sequence {(y*,z*,x*)} generated from ([ 6 ]) has a cluster 
point (y*,z*,x*). Suppose also that is a KL function. Then the whole sequence {(y*,z*,x‘)} 
is convergent. 


^We refer the readers to, for example, fT1[2l[7ll8] . for the definition and examples of KL functions. In particular, 
if / and g are proper closed semi-algebraic functions, then is a KL function for any 7 > 0. 
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As we have seen from Theorems [T] and [H our convergence analysis of the PR splitting method 
requires that the nonconvex objective function can be decomposed as f + g where / is strongly 
convex. It should be noted that if the strong convexity assumption on / is dropped, then the 
sequence generated is not necessarily converging to/clustering at a stationary point even when g 
is also convex. On the other hand, in the next section, we will demonstrate how the method can 
be suitably applied to minimizing a coercive function F + G, where G is a proper closed function 
and F is convex Lipschitz differentiable but not necessarily strongly convex. 

3 Peaceman-Rachford splitting methods for nonconvex prob¬ 
lems with non-strongly convex decomposition 

In many applications, the underlying optimization problem can be formulated as 

min F{u) + G{u) (32) 

where A + G is coercive, F \s a convex smooth function with a Lipschitz continuous gradient whose 
modulus is at most Lp > 0, and G is a proper and closed function with a nonempty proximal 
mapping prox,.Q( 2 ;) for any z and any r > 0. For example, when F is the least squares loss 
function for linear regression and G is the indicator function of the £i norm ball, the problem 
(15^ reduces to the LASSO [35]. This and various related (possibly nonconvex) models have been 
studied extensively in the statistical literature; see, for example, mniiiiiiiTiiig. We will also 
provide more concrete examples and simulation results later in Section |3| 

In view of the structure of (l32l) . a natural way of applying a splitting method would be to 
set /(y) = F{y) and g{z) = G{z). However, since this choice of / is not strongly convex, our 
convergence theory in Section |2| cannot be applied to deducing convergence of the resulting PR 
splitting method. 

Thus, we consider an alternative way of splitting the objective in order to obtain a strongly 
convex /. To this end, we start by fixing any a > 0 and defining f{y) = F{y) + f ||y|P, g{z) = 
G{z) — §||z|p. Then V/ is Lipschitz continuous with a modulus at most L = Lp + a, and / is 
strongly convex with modulus at least a = a. Thus, one only needs to pick a > 2Lp so that 
3a > 2L. Let a = f3Lp for some /3 > 2. Then the upper bound of 7 in m is given by 

a-2Lp P-2 

(Lp + a)^ ~ (/3 + 1)2 Lf' 

Consequently, if we set 

/(y) = F{y) + and g{z) = G{z) - ^^\\zf, 

then we can pick 0 < 7 < Moreover, for this choice of 7 , the Assumption [5] is satisfied 

for the above choice of g. Hence, it follows from Theorem [2] that the sequence generated by apply¬ 
ing the PR splitting method to this pair of / and g is bounded, and then any cluster point gives 
a stationary point of (1321) . according to Theorem |T| For concreteness and easy reference for our 
subsequent discussion, we present this algorithm explicitly below; 


®One natural choice of 0 is to set 0 = 5 so that max ^>2 (ppi)^Lp ~ attained. However, we discover 

in our numerical experiments that a smaller (3 > 2 coupled with a suitable heuristic for updating 7 leads to faster 
convergence. 
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PR splitting method for ((32ll 
Step 0. Input x^, (3 > 2 and 7 G ^ 0 , • 

Step 1. Set 

' = argmin|F(2/) + + ^\\y - x^f^ , 

^ G Argmin |g(z) - ^\\zf + - x‘ - zf | , 

^ x‘+^ = X* + 2(z*+^ - y*+^). 

Step 2. If a termination criterion is not met, go to Step 1. 


To the best of our knowledge, the global convergence of the sequence generated from (|33)l is 
new, which we summarize below for concreteness. 

Corollary 1. Consider optimization problem (15^ and let {{y*,z*,x*)} be the sequence generated 
from (ESI). Then the sequence is bounded, and any cluster point (y, z, x) would satisfy y = z, and 
z is a stationary point of (I32|) . that is, 


0 G VA(z) +aG(z). 

Proof. We first note that since (|3^ is just ([6]) applied to /(y) = F{y) + ^§^||y|P and y(z) = 
G(z) — we obtain immediately from the above discussion and Theorem [T] that y = z and 

z is a stationary point of (15^ for any cluster point {y,z,x). In addition, the objective function 
/ + y = A + Gis coercive by assumption. The boundedness of the sequence {(y*, z*, cc*)} now 
follows from Theorem [2] This completes the proof. □ 

3.1 Peaceman-Rachford splitting method for convex problems 

In this subsection, we suppose in addition that the G in (15^ is also convex. Hence, (15^ is 
a convex problem. We first establish the following global (ergodic) complexity result for the 
sequence generated from (1331) . Similar kinds of complexity results have also been established for 
other primal-dual methods for convex optimization problems; see, for example, |331 Theorem 2]. 
We would like to emphasize that the PR splitting method we discuss here is different from the 
classical PR splitting method in the literature: we split the convex objective F-l-G into the sum of 
a strongly convex function / and a possibly nonconvex function y, while the classical PR splitting 
method only admits splitting into a sum of convex functions. 

Theorem 4 (Global iteration complexity under convexity). Consider optimization problem 
(1321) with G being convex. Let {{y*, z*, x*)} be the sequence generated from (l33l) and [y,z,x) he any 
cluster point of this sequence. Then, y = z and z is a solution of (I32]). Moreover, for any N > 1, 
we have 

+ G(J") - F{i) - G(z) < (i - flL.'j ||x" - If, (34) 

where z^ := ^ 

min — a;‘||} = o(—^). 

o<t<N^" \/N 
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Proof. Since (15^ is convex, we conclude that z is actually optimal. We now establish the inequality 
(l34l) . First, from the first-order optimality conditions for the y and z-updates in (|33l) . we have 


- (pLf + -) + -X* = VF(y*+i), 

, \ ^ (35) 

(pLf - - ) - -X* + -y*+^ e dG{z*+^). 

V 7/ 7 7 

Moreover, it is not hard to see from the definition of cluster point and m that (IM|l is also satisfied 
with X in place of x* and (y, z) in place of Write w\ = uf — w for w = x, y or z for 

notational simplicity. We have from (1351) (and its counterpart at (y,z,x)) and the monotonicity of 
convex subdifferentials that 

/- (i3Lf + yl'^^ + -a;*,y‘+A > 0, / (i5Lf- 4+^ - -x* -b -y*+\z*+A > 0. 

\ \ 1J 7 / W 7/ 7 7 / 

Summing these two relations and rearranging terms, we obtain that 

{xW+^ - + 2(y*+i, 4+1) > (1 + /37iF)||2/4'f + (1 - /?7iF)||^4'(36) 


Next, observe that 

(4,4+1 _ ^ ^ 1 ||2 ^ _ ^ t + l ||2 _ ||^ m ||2) 

= + (37) 

= ) + INe+'ll + - 2(y4\ 4+'), 

where the first and third equalities follow from the third relation in (1331) . the second equality 
follows from the elementary relation {u,v) = ^(Hup + ||t>4 ~ 11^^ ~ '*^4) as applied to m = x* and 
V = X* — a;*+i. Combining (157)) with (155)) . we see further that 

Jll4f - > PiLfM-^T - 114+44 (38) 

Next, using the fact that VF’ is Lipschitz continuous with modulus at most Lf, we have 

f(4+1) < f^(4+1) + (vf(4+1),z‘+i - 4+4 + ^14*+' - 4+'f ■ (39) 
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From this we see further that 


F{z*+^) + G(z*+i) - F{z) - G{z) 

< F(y‘+i) - F{y) + G(z‘+i) - G^z) + (VF(y‘+i), z‘+i - y‘+i) + ^||z‘+i - 

< {VF{y^+^),yl+^) + - i) 

+ (VF^(2/‘+^), - 2/‘+^) + ^||z‘+i - y‘+i^ 

= /vF’(y*+i)+(/3ij^--)z‘+i--a:* + -2/‘+\z*+A + :^||z‘+i-y*+if 

\ V 7/ 7 7 / 2 (40) 

= (- - i) »'+■ + (^l3Lp - i) /+',2r‘) + 

= (^i - (/+■ - z‘+\4y + - /+‘f 

= 5 0 -^ip) (ll»r‘f - IIp^'II") + 5 ((1 + «ip- ll=‘*' - y‘*T 

s 5 (; - /Sip) (Il97'll“ - ll^7‘ll^) < 8^ (; - /Jip) (Il7lt - II7+'II^), 

where: the first inequality follows from (l3^ and the fact that z = y; the second inequality follows 
from the subdifferential inequalities applied to F and G at the points y*'^^ and z*'^^ respectively, 
and also the second relation in (|35ll : the second equality follows from the first relation in ([35]); 
the fourth equality follows from the elementary relation {u,v) = |(||m + u||^ — ||u||^ — ||u|p) as 
applied to u = zl'^^ and v = — 2 :*+^; the second last inequality follows from the fact that 

0 1 < {p+i)^Lf (l + /5)i-F~^<0i while the last inequality follows from (l38l) . 

Summing both sides of (l40l) from t = QtoN—l>Q and using the convexity of F" + G, we have 


N-l 


F{z^) + G(z^) - F{z) - G{z) + G{z*+^) - F{z) - G{z)) 

t=o 

1 


< 


8P"fNLp \7 


1 


- I3Lf 1 ||a:° - x\\'^, 


where z^ is defined in the statement of the theorem. This proves (IMl) . 
Finally, observe from the last equality in (HU)) that for alH > 1 


0 < Fiz*+^) + G(z*+^) - F{z) - G(z) 


< 


1 _||4+i|| 


t+l||2N 


) + o ( (1 + /3)Fr - 
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where the first inequality follows from the optimality of z. Rearranging terms in the above relation, 
we see further that 


- - (1 + /3)Lp - y^^r <[--PLp\ (lly^ir - iNri^)- 


Using this relation and the definition of the x-update, we obtain 

N-l N-l 


-T 

4 ^ 


lx*+i-x*||2 




< 


7 


t=o 


< 


1 - (1 + /3)jLf V7 
1 


Af-l 


--plf] 


„‘+l||2 


^4+l||2\ 


t=0 


4/3Lf(1 - (1 + PhLp) 77 


-f^LF]\\x^-xr, 
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where the last inequality is due to (1551) . Thus, < +oo and so, — 

—>■ 0 as iV —>• oo. Now consider un '■= niino<t<Ar{||x*+^ — a;*|p} for all > 0. Then, we have 
OiN+i ^ OiN for all > 0 and, 

2N-1 

Na2N < oiN + .. .a2N-i < ^ - a;*||^-)■ 0. 

t=N 

This implies that aN = o(l/A^). Therefore, the conclusion follows. This completes the proof. □ 

Next, we show that the PR splitting method exhibits linear convergence in solving (l32ll if G 
is convex and F + G is strongly convex. We note that, for the classical PR splitting method, 
linear convergence under strongly convexity is known; see [261 Remark 10 and Proposition 4]. As 
explained before, here we are considering a different PR splitting method. 

Proposition 1. (Linear convergence under strong convexity) Consider optimization prob¬ 
lem (I32|) with G being convex. Suppose that F G is indeed strongly convex. Let {{y*, z*, x*)} be 
the sequence generated from (ESI). Then {(y*,z*,x*)} converges linearly to {y,z,x) with y = z and 
z being the unique optimal solution for ESI), i-e., there exist M > 0 and r £ (0,1) such that for 
all t > 1, 

max{||y‘- y||M|z‘- - xf} < Mr‘. 

Proof. Let {y,z,x) be any cluster point of the sequence {(y*, z*, a:*)}. As before, we write w* = 
w* — w for w = X, y OT z for notational simplicity. From the preceding theorem y = z and z is 
optimal for (l32l) . Note that F+G is strongly convex. Hence, the optimal solution of (l32l) exists and 
is unique. Consequently, the whole sequence {(?/*, z*)} converges to the unique limit (z, z), where 
z is the unique solution of (ESj). From this and (ISSl) one can deduce that {x*} is also convergent, 
and hence, converges to x. We next establish linear convergence. 

Denote the strong convexity modulus of F + G by cti . From (l40)) , the strong convexity of F + G 
and the fact that z is the solution of (IMl) . we see that for alH > 1, 

< F(z*+i) + G(z*+i) - F(z) - G(z) < G(||a;‘f - ||a;‘+if), (41) 

where G := (7 ~ ■ Moreover, from the last inequality in (l40l) . we have for alH > 1, 

Ci(||y*+^f - ) < G(||a;*f - ), 

where ^ — jdLp^. It then follows that 

- iK+'f) < • 

This together with (HD) gives us that for alH > 1, 

< (^ + ^) (Il4f - Ike+'f )■ (42) 

On the other hand, note from the first relation in (1551) that 

- (fiLF + -) yl+^ + -xl = VF(y‘+i) - VF(y). 

V 7/ 7 

This together with the Lipschitz continuity of VF implies that 

- (pLf + -) + -II4II < ||VF(j/‘+^) - VF(y)|| < LfU+^W 

V 7/ 7 
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and consequently, ||Xg|| < ((1 + (3)^Lf + Thus, we obtain that, for alH > 1 

1 




((l + /3)7Tf+ 1)2" 

This shows that there exists r G (0,1) such that 

||x‘+if <r|Kf for allf>l. 

It follows that 

\\xlf < \\x°-xfr* for alH > 1. 

Moreover, from dm and (1421) . this further yields that, for all t > 1, 

(Tl (Tl 


and 


ll+'f <{^ + §-) 11411“ s (^ +1 


^0 


Therefore, the conclusion follows. 


)■ 


□ 


4 Applications 

In this section, we apply the PR splitting method (1551) to solving two important class of nonconvex 
optimization problems: constrained least squares problem and feasibility problems, based on our 
discussion in Section |3 


Constrained least squares problems. A common type of problems that arises in the area of 
statistics and machine learning is the following constrained least squares problem: 

min i||AM- 6 |l 2 , ( 43 ) 

where A is a linear map, & is a vector of suitable dimension, and I? is a nonempty compact set that 
is not necessarily convex. See [251[52] for concrete examples of (1451) . 

The classical PR splitting method applied to (155)) does not have a convergence guarantee. As 
an alternative, as discussed in Section [S) we can set f{y) = \\\Ay — 6||2 + 
g{z) = 5 d{z) — II^11 2 ggfj apply the PR splitting method accordingly. 

We next discuss computation of the proximal mappings. We start with the proximal mapping 
of 7 ( 7 . From the definition, for each w, the proximal mapping gives the set of minimizers of 


min 

z^D 




It is clear that this set is given by Pd a)'i ) since 7 < ^Ama \a* A) ■ other hand, 

to compute the proximal mapping for 7 /, we consider the following optimization problem for each 
w 

mm {i||A2/ - \\y\\^ ’ 

whose unique minimizer is given by 


y = [(/ 37 Amax(A’"A) + 1)1 + yA^A] ^{w + ^A^b). 
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Thus, the PR splitting method for (H51) can be stated as follows: 


PR splitting method for (|43)) 

Step 0. Input 1 °, /3 > 2 and 7 £ (o, A) ) ■ 

Step 1 . Set 

= [(, 07 Ai„ax(^^^) + l)d + + "fA^b), 

^ 'vl-/3Amax(Al^A)7j ’ 
x‘+i = x‘ + 2(z‘+^ - 2/‘+^). 

Step 2. If a termination criterion is not met, go to Step 1. 


(44) 


As a consequence of Corollary[Tl we see that Algorithm (l44l) generates a bounded sequence such 
that any of its cluster point gives a stationary point of (l43l) . We note that this global convergence 
result of (l4^ is new even when D is convex. 

To illustrate our proposed approach, we now test the PR splitting method (l4^ on solving 
(IT51) . We compare our algorithm against the DR splitting method in [^. Our initialization and 
termination criteria for both algorithms are the same as in [251 Section 5]; both algorithms are 
initialized at the origin and terminated when 

max{||:r^-x*-i||,||y*-y*-i||,||^*-z*-i||} , 

max{||:r‘-i||,|| 2 /‘-i||,||z‘-i||,l} ^ ^ ^ 


for some tol > 0. Note that, in general, the upper bound of 7 in algorithm (HTl) might be too 
small in practical computation. Thus, following a technique used in |251 Section 5] for the DR 
splitting method, we adopt a heuristic for PR splitting method in our numerical simulation, which 
combines algorithm (|4^ with a specific update rule of the parameter 7 . In particular, we set 
/? = 2.2 and start with 7 = 0.93/(/3Ai„ax(A^A)). We then update 7 as max{^, 0.9999 - 71 } whenever 
7 > 7 i := {A'!'A) sequence satisfies either ||y* — y*~^\\ > or ||y‘|| > 10 ^°. 

Following a similar discussion as in Remark 4], one can show that this heuristic leads to a 
bounded sequence which clusters at a stationary point of (H51) . On the other hand, for the DR 
splitting method, we use the same heuristics described in Section 5] for updating 7 but we 
consider three different initial 7 ’s: k ■ 70 for k = 10, 30 and 50, with 70 = — l)/Amax(A^A). 

These variants are denoted by DRio, DR 30 and DR 50 , respectively. 

In our first numerical experiment, we first randomly generate an m x n matrix A, a noise vector 
e G IR™, and also an a; G IR’’ with r = , all with i.i.d. standard Gaussian entries. We further 

scale each column of A to have norm 1. Next, we generate a random sparse vector x G IR" by first 
setting X = 0 and then assigning randomly r entries in x to be x. Finally, we set b = Ax + 0.01 • e 
and D = {x G H" : ||a;||o < r, ||a;||oo < 10®}; here ||a;||o denotes the cardinality of x and ||a:||oo is 
the £00 norm of x. 

We generate 50 random instances as described above for each pair of (rn,n), where m G 
{100,200,300,400,500} and n G {4000,5000,6000}. Our results are reported in Table [U where 
we present the number of iterations and the function value at terminatiorQ averaged over the 50 
instances. One can observe that the PR splitting method is faster than the DR splitting meth¬ 
ods for larger m. Besides, the function values obtained by the PR splitting method are usually 
comparable with DR 30 , worse than DR 50 and better than DRiq. 

"^We choose tol = 10“®, and we report for both methods. 
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Table 1: Comparing DRio, DR 30 , DR 50 and PR splitting for constrained least squares problem on 
random instances. 


Data 

DRio 

DR 30 

DR 50 

PR 

m 

n 

iter 

fval 

iter 

fval 

iter 

fval 

iter 

fval 

100 

4000 

805 

5.00e-01 

225 

2.67e-01 

274 

7.73e-02 

324 

3.17e-01 

100 

5000 

962 

6.43e-01 

252 

4.96e-01 

291 

2.06e-01 

370 

4.95e-01 

100 

6000 

1137 

6.18e-01 

326 

5.02e-01 

301 

2.53e-01 

436 

4.76e-01 

200 

4000 

508 

5.32e-01 

172 

4.74e-02 

217 

9.20e-03 

185 

7.59e-02 

200 

5000 

624 

5.78e-01 

195 

6.93e-02 

234 

9.10e-03 

224 

2.06e-01 

200 

6000 

723 

6.93e-01 

220 

1.60e-01 

250 

8.94e-03 

281 

1.77e-01 

300 

4000 

415 

1.41e-01 

141 

1.33e-02 

184 

1.31e-02 

123 

1.39e-02 

300 

5000 

489 

2.70e-01 

154 

1.39e-02 

201 

1.35e-02 

150 

1.42e-02 

300 

6000 

567 

5.20e-01 

170 

1.36e-02 

215 

1.32e-02 

187 

1.44e-02 

400 

4000 

322 

4.35e-02 

124 

1.78e-02 

166 

1.75e-02 

91 

1.79e-02 

400 

5000 

406 

9.08e-02 

137 

1.77e-02 

179 

1.75e-02 

115 

1.83e-02 

400 

6000 

481 

1.48e-01 

148 

1.82e-02 

194 

1.77e-02 

140 

1.85e-02 

500 

4000 

258 

2.53e-02 

114 

2.26e-02 

160 

2.23e-02 

75 

2.27e-02 

500 

5000 

314 

2.97e-02 

124 

2.20e-02 

166 

2.17e-02 

92 

2.22e-02 

500 

6000 

406 

4.05e-02 

135 

2.25e-02 

178 

2.22e-02 

112 

2.27e-02 


We also perform experiments using real data. We consider four sets of real data for the A and 
b used in (l43l) : leukemia data, lymph node status data, breast cancer prognosis data and colon 
tumor gene expression data. We use the leukemia data pre-processed in [34], that has 3501 genes 
and 72 samples. The lymph node status data we use are pre-processed in [14], with 4514 genes 
and 148 samples. The breast cancer prognosis data we use are pre-processed in [34], containing 
4919 genes and 76 samples. Finally, we use the data pre-processed in [TO] with 2000 genes and 62 
samples for the colon tumor gene expression data. 

Similar to EH Section 3.3], for all the data, we first standardize A and b to make each column 
have mean 0 and variance 1, and then scale the columns of A to have unit norm. For the A and b 
thus constructed, we solve (ITOll with D = {x € H" : ||a;||o < r, ||a;||oo < 10®} for r = 10, 20, 30 by 
the PR splitting method (HU) and compare it with DRio, DR 30 and DR 50 . Our numerical results 
are presented in Table ISj® where one can see that PR is slower than DR 50 and faster than DRiq. 
Moreover, it usually outperforms DR 30 in terms of function values, and its speed is comparable 
with DR 30 for the Breast and the Colon data. 


Table 2: Comparing DRio, DR 30 , DR 50 and PR splitting on real data. 


Data 


DRio 

DR 30 

DR 50 

PR 


iter 

fval 

iter 

fval 

iter 

fval 

iter 

fval 


10 

8242 

2.40e-|-00 

1805 

3.92e+00 

1229 

3.92e+00 

3461 

2.47e-|-00 

Leukemia 

20 

7890 

2.32e+00 

3727 

6.09e-01 

3065 

5.81e-01 

6608 

3.05e-01 


30 

12530 

2.24e-01 

5011 

3.01e-01 

2988 

1.47e-01 

8265 

1.20e-01 


10 

1345 

2.93e+01 

758 

2.90e+01 

496 

2.90e+01 

1297 

2.76e-|-01 

Lymph 

20 

5912 

2.26e+01 

1910 

1.91e+01 

895 

1.73e-|-01 

2529 

1.84e+01 


30 

9354 

7.91e-|-00 

1883 

1.34e+01 

939 

1.44e+01 

2089 

8.27e-|-00 


10 

2338 

1.28e+01 

2705 

9.33e-l-00 

1095 

8.40e-|-00 

1656 

1.33e+01 

Breast 

20 

14359 

2.90e+00 

2345 

3.53e+00 

2824 

4.11e+00 

2906 

2.81e+00 


30 

9905 

6.96e-01 

5162 

1.33e+00 

3802 

7.50e-01 

8241 

9.58e-01 


10 

7072 

8.08e+00 

4313 

8.08e-l-00 

3352 

8.08e-l-00 

4463 

8.08e-l-00 

Colon 

20 

14393 

3.20e+00 

7011 

1.95e+00 

9798 

2.29e+00 

6187 

1.89e-|-00 


30 

18361 

7.17e-01 

8952 

6.45e-01 

4922 

7.26e-01 

10937 

1.33e+00 


®We choose tol = 10 and we report ^\\Az^ — for both methods. 
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Feasibility problems. Another important problem in optimization is the feasibility problem 
[2l-l4l 1^128]. We consider the following simple version: hnding a point in the intersection of a 
nonempty closed convex set C and a nonempty compact set D. It is well known that this problem 
can be modeled via (1^ by setting F{u) = and G{u) = 6d{u); see, for example, [27]. For 

this choice of F, we have Lp = 1. 

As before, it can be shown that the proximal mapping of yg is given by Pp ^ since 7 < 

We next compute the proximal mapping for 7 / in this case. From the definition, for each w, we 
consider the following optimization problem 


:= mm + ^\\y - wfj 


= minmin \h\y- u\\^ + + ^\\y - u;|| 

uGC V \ Z Z 27 


(46) 


Notice that the inner minimization on the right hand side is attained at 

7 M + w 
(1 + /3)7 + 1- 

Plugging (|T7)) back into the (|46)) . we see further that 

^ = ((l + ;3)^ + l)2 (mg|^ll(l + /^7)^-t«f + ^Wiu + wf + |||u- (1 + ^)u;f I . 
It is routine to show that the minimum in (1481) is attained at 


(47) 


(48) 


u = Pc 


w \ 

1 + ^7/ ' 


Combining this with (14711 . the proximal mapping of 7 / at w is given by 


iPc {ttm) + ^ 

(I + /3)7 + I 

Thus, the PR splitting method for (15^ with F{u) = ^dQ{u) and G{u) = 6 d{u) can be described 
as follows: 


PR splitting method for (IM)) with F{u) = i(ip(u) and G{u) = S£){u) 
Step 0. Input a;°, /3 > 2 and 7 £ ^0, ■ 


Step 1. Set 


„<+■ = 


T-Po (ifk) 


A +1 


G Pd 


(I + /3)7 + I 

2 y* + l _2.t' 


1-^7 

f =a:‘+2(z*+i-2/‘+^). 


Step 2. If a termination criterion is not met, go to Step I. 


(49) 


Similarly, as an immediate consequence of Corollary (T] we see that Algorithm (H^ generates 
a bounded sequence such that any of its cluster point gives a stationary point of (1321) . We would 
like to point out that this global convergence result of (I49|l is new even when D is also convex. 
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As an illustration of our proposed approach, we now test the PR splitting method on solving 
(l32ll with F{u) = and G{u) = 6d{u) via MATLAB experiments. We again benchmark 

our algorithm against the DR splitting method in [25]. Both algorithms are initialized at the 
origin and terminated when (1451) is satisfied with tol = 10“®. Also, as in the previous subsection, 
we adopt a heuristic for updating 7 following the technique used in [251 Section 5]. Specifically, 
for the PR splitting method (H^ . we set /3 = 2.2 and start with 7 = 0.93//3 and update 7 as 
max{^, 0 . 9999 - 71 } whenever 7 > 71 := and the sequence satisfies either ||y* — 

or ||?/*|| > 10^°. Following a similar discussion as in [^ Remark 4], this heuristic can be shown to 
give a bounded sequence that clusters at a stationary point of (15^ . On the other hand, for the 
DR splitting method, we adopt the same heuristics described in j^S] Section 5] for updating 7 but 
we consider three different initial 7 ’s: fc • 70 for fc = 50, 100 and 150, with 70 := ~ 1- These 

variants are denoted by DR 50 , DRiqq and DR 150 , respectively. 

As in [251 Section 5], we consider the problem of finding an r-sparse solution of a randomly 
generated linear system Ax = b. To be concrete, we set C = {x G H" : Ax = b} and D = {x G 
IR" : ||a^||o < "T, Halloo < 10 ®}; here ||a;||o denotes the cardinality of x and ||a;||oo is the ioo norm of 
X. For the set C, we first generate an m x n matrix A and an x € IR*^ with r = both with 
i.i.d. standard Gaussian entries. We then set x to be the n-dimensional zero vector and randomly 
assign r entries in x to be x. We further project this x onto [—10®, 10®]" so that x G D. Finally, we 
set b = Ax. Consequently, the intersection C fl D is nonempty for the instance generated because 
it contains x. In particular, this means that the globally optimal value of min„ {^(Pciu) :uGD} 
is zero. 

In our experiments, we generate 50 random instances as described above for each pair of (to, n), 
where to G {IOO, 200, 300,400, 500} and n G {4000, 5000, 6000}. We report our results in Tables [3] 
and m where we present the number of iterations averaged over the 50 instances, the largest 
and smallest function values at termination^ and also the number of successes and failures in 
identifying a sparse solution of the linear system^ We also present the average number of iterations 
for successful instances (iterg) and failed instances (iterf). 

In TableO we compare our PR splitting method with DR 150 . One can observe that this version 
of DR splitting method outperforms the PR splitting method in terms of the solution quality in 
this setting. However, the PR splitting method is consistently faster and its performance becomes 
comparable with the DR splitting method for easier instances (larger to and smaller njm). 

We also present in Table |4| the numerical results for DR 50 and DRioq. One can see that the 
DR splitting method becomes faster (while still slower than the PR splitting method) for these 
two smaller initial 7 , at the price of fewer successful instances. 

5 Concluding remarks 

In this paper, we studied the applicability of the PR splitting method for solving nonconvex opti¬ 
mization problems. We established global convergence of the method when applied to minimizing 
the sum of a strongly convex Lipschitz differentiable function / and a proper closed function g, 
under suitable assumptions. Exploiting the possible nonconvexity of g, we showed how to suitably 
apply the PR splitting method to a large class of convex optimization problems whose objective 
function is not necessarily strongly convex. This significantly broadens the applicability of the PR 
splitting method to cover feasibility problems and many constrained least squares problems. 

®For both methods, we report 

^We declare a failure if the function value at termination is above 10“®, and a success if the value is below 10“^^. 
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Table 3: Comparing DR 150 and PR splitting on random instances. 


Data 

DRiso 

PR 

m 

n 

iter 

fvalmax 

fvaljmin 

succ 

fail 

iters 

iterf 

iter 

fvalmax 

fvalmin 

succ 

fail 

iters 

iterf 

100 

4000 

2073 

3e-02 

le-16 

36 

14 

1861 

2617 

297 

6e-02 

4e-05 

0 

50 

- 

297 

100 

5000 

2931 

3e-02 

le-16 

12 

38 

1842 

3275 

367 

5e-02 

3e-05 

0 

50 

- 

367 

100 

6000 

2014 

2e-02 

2e-16 

5 

45 

1891 

2028 

431 

5e-02 

8e-08 

0 

49 

- 

423 

200 

4000 

833 

7e-02 

3e-16 

49 

1 

825 

1219 

189 

2e-01 

le-15 

15 

35 

227 

173 

200 

5000 

970 

5e-02 

2e-16 

48 

2 

947 

1528 

230 

le-01 

2e-15 

11 

39 

297 

211 

200 

6000 

1254 

4e-02 

3e-16 

44 

6 

1193 

1704 

277 

le-01 

3e-15 

4 

46 

344 

271 

300 

4000 

607 

3e-15 

2e-16 

50 

0 

607 

- 

132 

3e-01 

9e-16 

38 

12 

138 

111 

300 

5000 

705 

3e-15 

3e-16 

50 

0 

705 

- 

163 

2e-01 

le-15 

24 

26 

181 

146 

300 

6000 

819 

3e-15 

4e-16 

50 

0 

819 

- 

204 

2e-01 

2e-15 

16 

34 

241 

187 

400 

4000 

523 

3e-15 

5e-17 

50 

0 

523 

- 

95 

2e-01 

8e-16 

44 

6 

96 

91 

400 

5000 

574 

4e-15 

2e-16 

50 

0 

574 

- 

125 

3e-01 

le-15 

43 

7 

127 

114 

400 

6000 

655 

4e-15 

5e-16 

50 

0 

655 

- 

156 

3e-01 

2e-15 

27 

23 

165 

145 

500 

4000 

500 

2e-16 

7e-19 

50 

0 

500 

- 

106 

2e-01 

6e-16 

49 

1 

64 

2173 

500 

5000 

521 

le-15 

4e-17 

50 

0 

521 

- 

91 

3e-01 

le-15 

47 

3 

91 

87 

500 

6000 

560 

4e-15 

4e-16 

50 

0 

560 

- 

123 

3e-01 

le-15 

47 

3 

124 

108 


Table 4: Computational results for DR 50 and DRioq. 


Data 

DR 50 

DRioo 

m 

n 

iter 

fvalmax 

fvalmin 

succ 

fail 

iters 

iterf 

iter 

fvalmax 

fvalmin 

succ 

fail 

iters 

iterf 

100 

4000 

336 

4e-02 

6e-16 

1 

49 

423 

334 

854 

2e-02 

2e-16 

5 

45 

716 

870 

100 

5000 

345 

4e-02 

3e-16 

1 

49 

423 

343 

681 

2e-02 

4e-16 

2 

48 

683 

681 

100 

6000 

349 

3e-02 

5e-03 

0 

50 

- 

349 

647 

2e-02 

3e-16 

1 

49 

715 

646 

200 

4000 

331 

le-01 

4e-16 

17 

33 

351 

321 

711 

7e-02 

8e-17 

48 

2 

669 

1728 

200 

5000 

332 

8e-02 

9e-16 

3 

47 

357 

330 

983 

5e-02 

le-16 

44 

6 

864 

1857 

200 

6000 

341 

7e-02 

5e-16 

6 

44 

396 

333 

1186 

4e-02 

le-16 

24 

26 

802 

1540 

300 

4000 

319 

2e-01 

le-16 

45 

5 

315 

353 

489 

3e-15 

4e-16 

50 

0 

489 

- 

300 

5000 

332 

le-01 

5e-16 

29 

21 

335 

328 

545 

3e-15 

4e-16 

50 

0 

545 

- 

300 

6000 

341 

le-01 

6e-16 

16 

34 

378 

323 

674 

5e-02 

3e-16 

49 

1 

651 

1799 

400 

4000 

271 

3e-15 

9e-16 

50 

0 

271 

- 

405 

4e-15 

2e-16 

50 

0 

405 

- 

400 

5000 

301 

le-01 

8e-16 

48 

2 

296 

413 

453 

4e-15 

5e-16 

50 

0 

453 

- 

400 

6000 

329 

le-01 

5e-16 

40 

10 

330 

329 

516 

4e-15 

5e-16 

50 

0 

516 

- 

500 

4000 

244 

5e-15 

2e-16 

50 

0 

244 

- 

363 

3e-15 

2e-16 

50 

0 

363 

- 

500 

5000 

269 

4e-15 

7e-16 

50 

0 

269 

- 

404 

5e-15 

3e-16 

50 

0 

404 

- 

500 

6000 

295 

5e-15 

4e-16 

50 

0 

295 

- 

442 

5e-15 

9e-16 

50 

0 

442 

- 


Appendix: Concrete numerical examples 

In this appendix, we provide some simple and concrete examples illustrating the different behaviors 
of the classical PR splitting method, the classical DR splitting method and our proposed PR 
splitting method (l33l) . 

The first example shows that, even in the convex setting, the classical PR splitting method can 
be faster than the classical DR splitting method, and our proposed PR method can outperform 
the classical DR method for some particular choice of the parameter 7 . The second example on 
nonconvex feasibility problem shows that the classical PR method can diverge while our proposed 
PR method converges linearly to a solution for the feasibility problem. 

Example 1. (Classical DR splitting method vs classical/proposed PR method) Consider 
f{x) = ||a;|p and g{x) = 0 for all x € IR". Then, a direct verification shows that, for any 7 > 0, 

prox.^^(2;) = argminKlIuf + ]^\\u- z\\^ 


27 + 1 
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and 


prox (z) = argmin ^||m - zf )■ = z. 


Thus, the classical DR method reads 


^t+i ^ I + (2prox^g - I) o (2prox^^ - I) ^ 


while the classical PR method reads 


27+1 




27+1 


t+i 


„i+l 


= (2prox^g - I) o (2prox^^ - J)(a;*) = 


27+ r 




1 - 




t+i 


27 + 1 / 




Thus, for this example, the classical PR method converges faster than the classical DR method 
when 7 £ ( 0 , 1 ). 

Moreover, let fi = 2.5 and 7 < Then, the proposed PR method (1551) reads 


V 


7, 


1 


y+ = argmin -||y|| +—\\y - x 


t||2 


27 ' 


1 


1 + 77 


x\ 


2 *+! = argmin --||z|| + 7r\\‘^y ^ - x- 


2 " " 27 

= a;‘ + 2(z*+^ - y‘+^) = ( 1 - 
P-2 _ J_ 


47 


(1 -57)(1 + 77) 


1 — 57 


x\ 


( 22 /*+' - x% 


(50) 


Note that, for 7 = 0.01 < we have 

47 


0 < 1 - 


(1-57)(1 + 77) 


< 0.97 < 


1 


27 + 1 


Thus, for 7 = 0.01, our proposed PR method (1331) is faster than the classical DR method for this 
example. 

Example 2. (classical PR method vs the proposed PR method) Let C = {(0,0)} and 
D = ({0} XIR) U (IRx {0}). We consider the feasibility problem of finding a point in the intersection 
of C and D. We start with the initial point x^ = {a, 0) with a ^ 0. Then, the classical PR splitting 
method applies to f{x) = Sc{x) and g{x) = Sd{x) for all x £ IR^, and reduces to 


r.t + 1 


= (2prox..yg - I) o (2prox^^ - /)(x‘) = (2 Pd - I) o {2Pc - I){x*) = -x\ 


Thus, the elassical PR splitting method diverges and cycles between two points (a,0) and (—a,0). 
On the other hand, let j3 = 5 and 7 £ (O, and consider the proposed PR method (1491) for 
feasibility problems. This algorithm reads 


2 /‘^^ = 


(if^) 


v‘+l 


G Pd 


(1 + /3)7 + 1 

2yt + l _ j-t 


1-/37 


67 + 1 ’ 

2yt+i _ a-t 

1 - 57 


(51) 


„i+l _ rr.i 


= x* + 2(z‘+i - 2/*+') = 1- 


27 


(1 - 57 )( 67 + 1 ), 

where the formula for the z-update follows from the fact that x*, 2 /* G IR x {0} C D, and so is 
2//*+' ~ a;* by the construction. Hence, the proposed PR method (1511) converges to (0,0) G C O D 
linearly in this case. 
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